Dissecting m8trix

published on 2024-07-31 by dzwdz

m8trix by HellMood is one of my favorite demos. It packs a pretty cool Matrix-style effect in only 8 bytes:

animated gif (epilepsy warning)

The author even provided the source with some comments:

org 100h

S: 
les bx,[si]     ; sets ES to the screen, assume si = 0x100
                ; 0x101 is SBB AL,9F and changes the char
                ; without CR flag, there would be
                ; no animation ;)
lahf            ; gets 0x02 (green) in the first run
                ; afterwards, it is not called again
                ; because of alignment ;)
stosw           ; print the green char ...
                ; (is also 0xAB9F and works as segment)
inc di          ; and skip one row
inc di          ;
jmp short S+1   ; repeat on 0x101 

…yeah, I didn’t really get it at first either. Let’s try to actually understand how it works (and learn some stuff about DOS along the way).

Note that I’ll be using hexadecimal numbers “by default” (without 0x) throughout this article to be consistent with DEBUG’s output.

DEBUG

The only tool I’ll be using on DOS’s side will be DEBUG. It’s a delightful little tool that ships with MS-DOS. I’ve personally used the FreeDOS version under DOSBox, as that’s what I had handy.

There’s builtin help if you type in ?, you can also check out this more in-depth guide, or this video of someone using it to assemble new binaries.

There’s a small issue, though. m8trix doesn’t actually work as-is under DEBUG, for reasons I’ll explain later.

a bad explanation of segmentation

If you’re a bit rusty on how real mode segmentation works, then here’s a quick reminder. There are a few 16-bit segment registers (CS, DS, SS, ES). When you reference memory in real mode you always1 use one of those registers, even if it’s implicit.

If you reference ES:BX, the real address this maps to is computed as ES * 0x10 + BX. This means that there are multiple ways to reference one physical memory location (even if that is only slightly relevant here).

As another example, B800:1234 points to B9234.

the first look

My comments are prefixed with a semicolon. As mentioned, all numbers shown are in hexadecimal.
C:\M8TRIX>debug M8TRIX.COM
-U ; disassemble the beginning of the program
073D:0100 C41C              LES     BX,[SI]
073D:0102 9F                LAHF
073D:0103 AB                STOSW
073D:0104 47                INC     DI
073D:0105 47                INC     DI
073D:0106 EBF9              JMP     0101
-U 101 ; disassemble the loop body
073D:0101 1C9F              SBB     AL,9F
073D:0103 AB                STOSW
073D:0104 47                INC     DI
073D:0105 47                INC     DI
073D:0106 EBF9              JMP     0101
-R ; look at the registers
AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
073D:0100 C41C              LES     BX,[SI]                        DS:0000=20CD

Let’s step through this.

LES BX,[SI]

AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
073D:0100 C41C              LES     BX,[SI]                        DS:0000=20CD
-T ; single step and show register state
AX=FFFF BX=20CD CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
DS=073D ES=9FFF SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC

LES loads a far pointer from memory. The first two bytes of [SI] will be loaded into BX, and the next two bytes will be loaded into ES.

We’re implicitly using the DS segment here, which is where DOS loaded our program into. To be more exact – our program was loaded into DS:0100, whereas DS:0000 (which [SI] points at) contains the Program Segment Prefix. Let’s take a look at it:

-d 0000
073D:0000  CD 20 FF 9F 00 EA FF FF-AD DE BD 1D 94 01 00 00 . ..............
[...]
-u 0000
073D:0000 CD20              INT     20
073D:0002 FF9F00EA          CALL    FAR [BX+EA00]
[...]

The first two bytes always contain INT 20, the instruction that quits your program. This means that you can quit your program by jumping to CS:0000 (CS = DS = SP). DOS also ensures that the word on top of the stack is 0000, so you can quit with a RET. Nifty. It also means that BX will always be set to 20CD, but we don’t actually really care about that.

The next two bytes point to the segment of the first free byte in memory. So, by loading them into ES, we make it point to the first free area in memory. On most systems that will be 9FFF. This is very convenient, as the mode 13 framebuffer begins at A0000, or 9FFF:0010. This is a well known sizecoding trick.

…except mode 13 is a graphic mode. We’re in mode 32, a text mode, and the text buffer is located at B800, completely out of reach of ES. What?

Well, DEBUG fooled us. When you start a program under DOS, SI=0100. Usually. However, for whatever reason, DEBUG zeroes it out instead. You can fix it by running RSI 01003 before the first instruction. This is also why the page I’ve linked to uses [BX], as you can count on it actually being zero.

But let’s get back to m8trix. If SI=0100, then [SI] points to the beginning of our program!

-RSI 0100
-R
AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
073D:0100 C41C              LES     BX,[SI]                        DS:0100=1CC4
-T
AX=FFFF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
DS=073D ES=AB9F SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC
073D:0102 9F                LAHF
-U 100
073D:0100 C41C              LES     BX,[SI]
073D:0102 9F                LAHF
073D:0103 AB                STOSW

As you can see, this means that BX=1CC4 (the LES instruction itself), and ES=AB9F. This means that ES spans AB9F0-BB9F0, which includes the entire text buffer!

LAHF

AX=FFFF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
DS=073D ES=AB9F SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC
073D:0102 9F                LAHF
-t
AX=46FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL ZR NA PE NC

LAHF is pretty straightforward, it just loads the top byte of FLAGS into AH. Except, once again, DEBUG doesn’t set the FLAGS register correctly. If we were to run m8trix outside of DEBUG, the top byte of flags would be 02, and thus this instruction would set AH=02. This can be fixed in the debugger by running RAX 02FF.

STOSW

-rax 02FF
-rax 02FF
-r
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL ZR NA PE NC
073D:0103 AB                STOSW
-t
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0002
DS=073D ES=AB9F SS=073D CS=073D IP=0104 NV UP EI PL ZR NA PE NC

STOSW – “Store (word) string” – is a bit more complex. It writes the word at AX to ES:DI, and increments4 DI by two – the amount of bytes written.

This instruction will be run over and over again, with DI taking on every even value and overflowing every once in a while, overwriting everything in ES – including the text buffer – over and over again.

Each character in the text buffer is represented by a word, so each STOSW writes a complete character to the screen. AH=02 sets the color to dark green, and AL (which changes each iteration) chooses the character

skipping a column, misaligned jump

AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0002
DS=073D ES=AB9F SS=073D CS=073D IP=0104 NV UP EI PL ZR NA PE NC
073D:0104 47                INC     DI
-t
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0003
DS=073D ES=AB9F SS=073D CS=073D IP=0105 NV UP EI PL NZ NA PE NC
073D:0105 47                INC     DI
-t
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
DS=073D ES=AB9F SS=073D CS=073D IP=0106 NV UP EI PL NZ NA PO NC
073D:0106 EBF9              JMP     0101
-t
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
073D:0101 1C9F              SBB     AL,9F

We don’t want the columns to be packed too tightly together, so we skip every other character by adding two bytes to DI.

We then jump to 0101, uncovering a hidden SBB.

misaligned jump, SBB

AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
073D:0101 1C9F              SBB     AL,9F
-t
AX=0260 BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL NZ NA PE NC

This is the last instruction, and it’s the one that modifies AL to animate the character. It subtracts 9F from AL with borrow, which is pretty much the grade-school approach. That is – if it underflows, it will “borrow” a bit from the next byte by setting the carry flag. The next SBB will see that the carry flag is set, subtract an additional 1, and unset the carry flag (unless it also underflowed).

Let’s see that in practice:
-rax 028F
-r
AX=028F BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
073D:0101 1C9F              SBB     AL,9F
-t
AX=02F0 BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI NG NZ NA PE CY
-rip 0101 ; i don't care about the rest of the loop, just run the SBB again
-t
AX=0250 BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL NZ AC PE NC

Notice how the second SBB subtracted A0 instead of 9F because of the carry flag.

Why does that matter? Let’s imagine this was a regular SUB instead, without a borrow. 9F is odd (coprime to 100), so it would take 100 iterations for AL to loop around (remember, we’re working with hexadecimal here). The loop runs for 10000/2=8000 iterations before DI repeats, and 8000 is divisible by 100, so each pass would have the exact same AL values for each character. Instead of an animation we’d get a much less impressive static screen.

Instead, AL repeats every 55 (decimal 85) SBB calls, which is coprime to 100, so the AL values will differ from pass to pass. There’s probably a way to determine the period by hand but I just used Python. Not all operands work for this, but 9F seems to be one of the good ones.

To quote the author, “without CR flag, there would be no animation :)”.

ending remarks

I think I’ve explained every aspect of how m8trix works by now. I don’t think I need to tell you how brilliant it is.

Notice how the third byte has three different meanings! At first it’s read as the low byte of the segment offset, then it’s part of the LAHF instruction, and then it’s the operand for the SBB.

STOSW is not only the perfect instruction for writing characters in text mode, it also works as the high byte of the segment offset that you need to write those characters in the first place.

Everything fits together so nicely :)

m7trix

Soon after m8trix was published, several people tried coming up with ideas to shrink it down even more. What follows is the final version HellMood published:

C:\M8TRIX>debug M7TRIX.COM
-U
073D:0100 C41C              LES     BX,[SI]
073D:0102 9F                LAHF
073D:0103 AB                STOSW
073D:0104 91                XCHG    AX,CX
073D:0105 EBFA              JMP     0101
-U 101
073D:0101 1C9F              SBB     AL,9F
073D:0103 AB                STOSW
073D:0104 91                XCHG    AX,CX
073D:0105 EBFA              JMP     0101

Not only is this version smaller, it also looks better, as it clears the screen! It’s also simple enough that I won’t bother tracing through it again.

In short – instead of skipping over every other column, we swap AX and CX back and forth. Both are running the same character animation, but, as CH=00, every other column is rendered as black or black, so the characters are invisible. This takes care both of skipping columns AND clearing the screen.

The character cycle is apparently5 different because the carry flag gets reused between odd and even columns, but the period still works out to be 85 – which I find interesting but I don’t really feel like researching why that is.

bonus: simplified version

This is a slightly modified version that works under DEBUG and doesn’t use misaligned jumps. It’s easy to experiment with as you can just load it into DEBUG, use the assembler to change a single instruction, and see what happens.

073D:0100 BB9FAB            MOV     BX,AB9F
073D:0103 8EC3              MOV     ES,BX
073D:0105 B402              MOV     AH,02
073D:0107 AB                STOSW
073D:0108 47                INC     DI
073D:0109 47                INC     DI
073D:010A 1C9F              SBB     AL,9F
073D:010C EBF9              JMP     0107

  1. At least I think so, but I’m not sure.↩︎

  2. MOV AH, 0F; INT 10, and look at the registers. AL is the current mode.↩︎

  3. No, RSI doesn’t stand for the 64-bit register. R is the register command, which accepts SI as the argument.↩︎

  4. If the direction flag was set, it would instead decrement it.↩︎

  5. The Python script I’m using for testing says so, but I can’t really tell if that’s true by just looking at the output.↩︎